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Abstract 

After the discovery that fixed points of loopy belief propagation coin- 
cide with stationary points of the Bethe free energy, several researchers 
proposed provably convergent algorithms to directly minimize the Bethe 
free energy. These algorithms were formulated only for non-zero tem- 
perature (thus finding fixed points of the sum-product algorithm) and 
their possible extension to zero temperature is not obvious. We present 
the zero-temperature limit of the double-loop algorithm by Heskes, which 
converges a max-product fixed point. The inner loop of this algorithm 
is max-sum diffusion. Under certain conditions, the algorithm combines 
the complementary advantages of the max-product belief propagation and 
max-sum diffusion (LP relaxation) : it yields good approximation of both 
ground states and max-marginals. 

1 Introduction 

Loopy belief propagation |17l is a well-known algorithm to approximate marginals 
of the Gibbs distribution defined by an undirected graphieal model. For acyclic 
graphs, BP always converges and yields the exact marginals. For graphs with 
cycles, it is not guaranteed to converge but when it does, it often yields sur- 
prisingly good approximations of the true marginals. One informal argument 
for this is that at a BP fixed point, marginals are exact in every sub-tree of the 
factor graph [23l [24] . Attempts to understand loopy BP has generated a large 
body of literature, see e.g. the survey HHj- 

BP has a modification, known as the max-product BP, where summations 
are replaced with maximizations. In statistical mechanics terminology, this can 
be understood as the zero-temperature limit of the ordinary BP. Max-product 
BP computes (or approximates) max-marginals rather than ordinary marginals. 

After the discovery [Ml [33] that BP fixed points coincide with stationary 
points of the Bethe free energy, several researchers proposed provably convergent 
algorithms to find a local minimum of the Bethe free energy [35l [28] [22l [Sj 
[6]. These algorithms have been proposed only for the sum-product and their 
possible extension to the max-product is not obvious. 
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We reformulate the double-loop algorithm [S] by Heskes such that taking its 
zero-temperature limit becomes straightforward, which results in an algorithm 
that always converges to a max- product BP fixed point. The inner loop of the 
algorithm is max-sum diffusion [1311111 1311 [2]. We empirically observed that with 
a uniform initialization, the algorithm always yielded the same approximation of 
ground states that would be obtained by max-sum diffusion (or other algorithms 
for MAP inference based on LP relaxation, such as TRW-S [H])- Thus, it 
combines the complementary advantages of max-sum belief propagation and 
LP relaxation: unlike the former, it yields good approximation of ground states 
and, unlike the latter, it yields a good approximation of max-marginals. 

The text is organized as follows. We first (32) review the basics of inference 
in graphical models. We thoroughly discuss the zero-temperature limit of the 
Gibbs distribution and related quantities and how to obtain their approxima- 
tion by variational inference. Then we review two basic cases of variational 
inference, with a convex free energy (^31) and with the Bethe free energy (Q. 
Then ( W3.ll tj4.ip we discuss their zero-temperature limits in detail. Finally 
(SJ3]) we reformulate the double-loop algorithm |5] and modify it for the zero 
temperature. 

2 Gibbs distribution 

Let V he a, set of variables, each variable v G V taking states Xy from a finite 
domain Xy. An assignment to a variable subset a C 1/ is € Xa, where Xa is 
the Cartesian product of domains Xy for v € a. In particular, xy G Xy is an 
assignment to all the variables. Let E C 2^, thus {V,E) is a hypergraph. Each 
variable v G V and hyperedgc a G i? is assigned a potential function 9y : Xy — > K 
and 9a: Xa — M, respectively, where M = MU { — oo}. All numbers 6y{xy) and 
Qa{xa) are understood as a single vector 6* 6 R (or mapping 6*: / — > M) with 

/ = { (w, Xy) I w e y, x„ g X„ } U { (a, Xa)\ae E, Xa € Xa }. 

The Gibbs probability distribution over the hypergraph (V, E) is given by 

p(xv)-CXp[(0,5(.Xy))-$(0)] (1) 

where the mapping 5: Xy {0, 1}^ is such that 

{0,6{xy)) = Y,U^v) + Y.^-(^-)- (2) 

For infinite weights, we set ~oo ■ = in the scalar product {9,d{xy)). Since 
unary terms are included in ^ explicitly, we assume that E contains no single- 
tons. The distribution is normalized by the log-partition function 

m = log^ exp(0, S{xy)) = 0(0, 6{xv)). (3) 



2 



In ([3]), we used x (B y = log(e'^ + e^) to denote the log-sum-exp operation. 
It will be useful to keep in mind algebraic properties of this operation. It is 
associative and commutative, and addition distributes over it. Thus, (R, ©, +) 
is a commutative semiring. This semiring is, via the logarithm map, isomorphic 
to the 'sum-product' semiring (R+,+, x). 



Marginals. The marginals of the distribution are 

fJ.v{Xy) = ^ P{xv), fJ'aiXa) = ^ P{xv), (4) 

where we abuse notation by writing V \ v instead of F \ {v}. The numbers ^ 
are understood as a vector /x g [0,1]^. All realizable marginal vectors ^ form 
the marginal polytope convi5(Xy), where S{Xv) = {S{xv) \ xv G Xy}- Be- 
sides (an exponential number of) other constraints, /i satisfies normalization 
and marginalization constraints 

^/i.u(x„) = ^//Q(Xa) 1, ^ jJ.a{Xa) = ^J.viXy). (5) 

All vectors /I > satisfying (O form the local marginal polytope A. We have 
conv S{Xy) C a, with equality if and only if hypergraph {V,E) is acyclic (i.e., 
its factor graph is a tree). We also introduce a symbol for log-marginals, 

UaiXa) = logMa(Xa) = (0, S{xv)) ~ ^9) (6) 

Xv\a 

(and similarly for iyy{xy)). For log-marginals, constraints read 

^VviXv) = ^l^aiXa) = 0, ^ Va{x a) = VviXy) ■ (7) 

Xv Xa Xa\v 



Reparameterizations. A reparameterization is an afSne transformation of 
vector 6 that preserves © for all assignments xy G Xy- We first define the 
local reparameterization on a pair (a, v) as follows: subtract an arbitrary unary 
function aav- Xy — > R from Oy and add the same function to 0a, 

9y{Xy) <- 6y{Xy) - ttayiXy), a{x a) <- a{x a) + Oay {x y) . (8) 

This preserves ([2]) because aav cancels out. We understand (jH)) as 'passing a 
message' aav Applying local reparameterization to all pairs (a, u) with 
V & a G E yields the general reparameterization 

Oy{Xy) = 9y{Xy) -'^aay{Xy), 0" {x a) ^ a{x a) + ^ aay {x y) (9) 

a 9 1? v^a 

where a = { aav{xy) \ v G a G E, Xy £ Xy } is the vector of all messages and 

the transformed vector 9 is denoted 0°' e 1^. Thus {0°',d{xv)) = {0,S{xv))- In 
fact, we have more generally (6'",/i) = {0,fJ.) for all /i satisfying ([5]) and all a. 



3 



Reparameterizations can be done either by directly modifying the vector 9 
or by keeping 9 unchanged and storing the messages a. While the former may 
be better for theoretical analysis, the latter is preferable in algorithms. In the 
sequel we freely switch between these two views. 



2.1 Zero-temperature limit 

In this section, we will use p{xv \ 9) and ^v{xv \ fJ-aixa \ 9) to explicitly denote 
the dependence of distribution ([T]) and its marginals on 9. 

In statistical physics, the Gibbs distribution is usually considered in a more 
general form as p{xv \ [36) , where /3 > is the inverse temperature [THj . The 
limit /3 — >■ cx) is then known as the zero-temperature limit. 

It is elementary to show that the distribution 

p°°{xv\e) = lim p{xv\f3e) (10) 

is zero everywhere except at ground states ^ which are the maximizers ofp{xv \ 9) 
or, equivalently, {9^5{xv))- If there are multiple ground states then the mass is 
distributed evenly among them. 

The zero-temperature limit of the log-partition function is 

$°°(6')= lim =niax(6',^(a;y)), (11) 

/3->oo p XV 

which follows from the limit 

hm =max{a;,y}. (12) 

The zero-temperature limit of log-marginals ^ yields max-marginal^ 

v^{xa) = lim "^^ik^AMl ^ max(^,<5(xy)) - $-(0) (13) 

/3— )-oo p Xv\cL 

(similarly for v'^{xy)). Observe that (|13p and (fTTj) differs from (jG]) and ([3]) only 
by replacing the log-sum-cxp operation '©' with 'max'. This corresponds, by the 
limit (|12p. to transition from the semiring (M, ®,-|-) to the max-sum semiring 
(R, max, -t-). Similarly, max-marginals satisfy normalization and marginaliza- 
tion conditions ([7]) in which has been replaced with 'max'. 

Max-marginals should not be confusecil with the marginals of p°°(xy \9). 
These are different quantities and one cannot be computed from the other. 



1 It would be more precise to call I I13I1 'max-log-marginals' or 'log-max- marginals'. But, 
as is usual in the literature, wo call them only 'max-marginals'. 

^ Unlike the limit mOI I. the limit II13I I from (log-)marginals to max-marginals rarely appears 
in the machine learning or computer vision literature. The only work we found is [§]. 
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Recovering ground states from m£LX-marginals. Ground states can be 
recovered from niax-marginals. To show that, we first recall what is the con- 
straint satisfaction problem (CSP) |151 [T], The CSP instance is defined by a 
vector c G {0,1}^, where functions Cy-. Xy {0,1} and Ca- Xa — > {0,1} are 
understood as relations. A solution of the CSP is an assignment xy satisfying 
all the relations, i.e., Cy{xy) = 1 for all v G V and Ca{xa) = 1 for all a G E. 
For a vector 6* G M we define vector \9~\ G {0, 1}^ by 

{1 ii Xy e a.rgmax9y{yy) \1 ii Xa e a.rgma.x 9a {y a) 

, \9]a{Xa) = < 
otherwise I otherwise 

i.e., a component of [0] equals 1 iff the corresponding component of 9 is maximal 
in its potential function. We say that such a components of 9 is active. Now the 
set argmaXj.^. (6*, (5(xv)) of ground states is the solution set of the CSP defined 
by vector \h'°°~\ of active max-marginals. 



2.2 Convex conjugacy and variational inference 

Let H{^) denote the entropy of the distribution ([T|) as a function of its marginals. 
The functions ^'(6') and —H{ii) are convex and they are related by convex 
conjugacy, 

max (14) 

where the optimum is attained for equal to the marginals In statistical 
physics, the quantity — — H{p-) is known as the Gibbs free energy of the 
system. By taking the limit /? — > oo of the expression 



— - — = max 

p /j.£conv S{Xy ) 



(15) 



we similarly obtain ^°°{9) and max-marginals 

The idea behind variational inference |25| is to replace the marginal polytope 
conv S{Xv) and the entropy H{iJ,) in p4)) with their tractable approximations. 
Then the optimal value and the optimal argument of (fT4| is an approximation 
of the log-partition function and marginals, respectively. For /? — >■ oo, 

• the optimal value of ([T5|) is an approximation of $°°(0), 

• the logarithm of the optimal argument of (|15p is an approximation of max- 
marginals 

• the solution set of the CSP defined by active approximate max-marginals is 
an approximation of the set argmax^^ (6', (5(xy )) of ground states. 

As the entropy term in (jl5|) approaches for /3 — > oo, one may think that it 
could be simply omitted. However, as pointed out in j27j . if the approximate 
entropy is non-convex (such as the Bethe entropy), the problem (jisp can have 
multiple local minima for arbitrarily large (3. Thus, if our algorithm finds only 
a local minimum of (|15p . the entropy term is crucial. 
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3 Convex free energy 



Let the marginal polytope in (|14p be approximated by the local marginal poly- 
tope A and the true entropy by H{fi) « — (log fi, fi). This entropy approximation 
is concave, thus we obtained a simple (arguably, the simplest possible) varia- 
tional inference method with a convex free energy [371 H] ■ The approximation 
of (|14p now reads 

max (0 — log /X, /i). (16) 

The problem (fTB)) can be solved as described e.g. in [32]. Its dual reads as 
follows: find a reparameterization of the original vector 9 that minimizes the 
function 

This is a majorant of the log-partition function, U{9) > ^{6) for every 9. A suf- 
ficient condition for dual optimality is that 

^ OaiXa) = 9,{X,) (18) 

Xa\v 

for all w e a G -E and Xy G Xy. The primal and dual optimum arc related by 

log Hy{Xy) = 9y{Xy) ~ ^9y{yy), log ^la{x a) ^ a{x a) ^ ^ 9 a{y a) ■ (19) 

Vv Va. 

Since function (jl7p is convex and differentiable, its global minimum over 
reparameterizations of 9 can be found by coordinate descent. This leads to 
a simple message passing algorithm. The iteration of this algorithm enforces 
equality (|18p for a single pair (a, w) by local reparameterization ([5]), which de- 
termines aay{xy) in uniquely. The iteration decreases U{9), and this decrease 
is strict unless U{9) is already minimal. On convergence, ^TE\\ holds globally. 

If reparameterizations are represented by messages rather than by directly 
modifying 9, the dual of reads min^ U{9°') and the coordinate descent be- 
comes Algorithm [TJ To correctly handle infinite weights, the algorithm expects 
that [9y(xy) > — oo] [maxa;^^^^ 9a{xa) > — oo] for all w G a G £' and Xy G Xy. 

Algorithm 1 The 'diffusion' algorithm, 
repeat 

for V Cz a E E and Xy G Xy such that 9y{xy) > — oo do 

O^aviXy) i (lav{Xv) ~^ ^y {Xy) ^'^ (.'^f^) 

Xa\. 

end for 

until convergence 
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3.1 Zero-temperature limit: max-sum diffusion 

The zero-temperature limit of the optimization problem above is obtained by 
replacing 9 with [iO and taking the limit P ^ oo. This results in replacing 
with 'max' in p7)) -([T ^ and Algorithm [1] We assume that this has been done. 

This yields the LP relaxation approach to maximizing the Gibbs distribution 
first proposed by Schlesinger et al. [191 [13], see also [29l[32l[3Tl[2]. In these works, 
the zero-temperature limit of Algorithm [T] is called max-sum diffusion. 

Let function PT)) after replacing with 'max' be denoted by 



We have U°°{e) > <^°°{e) for every 9. Algorithm [T] tries to minimize U°°{0) 
by reparameterizing 9. However, the function U°° is non-differentiable now - 
therefore Algorithm [Tj may converge only to a local (with respect to coordinate 
moves) minimum of U°°{9). While it is easy to prove convergence of the algo- 
rithm in value, convergence in argument is only a conjecture to date |29[ 131) 
and only a weaker property has been proved recently [18| . 

According to i j2.21 when 9 is optimal then U°°{9) is an approximation of 

{9) and ([T^ is an approximation of the max- marginals v°° . Note that the 
approximate max-marginals (jl9p are, up to normalization, directly equal to 9. 
Since vector [0] is not affected by normalization of 9, the solution set of the 
CSP \9'\ is an approximation of the ground states. 

But this is in agreement with [19j [29] , where it is shown that the inequality 
{7~(6)) > $°°(6)) (and hence the LP relaxation) is tight if and only if the CSP 
defined by [0] has a solution. Then, {9,5{xv)) = ^°°{9) for every solution 
Xv of CSP [0] . There are two important problem subclasses for which the LP 
relaxation is tight: if hypergraph {V, E) is acyclic or if the functions 9a are (per- 
muted) supermodular [29] [31] [11] . Besides, it is tight for many other instances 
met in applications. This makes this method very suitable for approximating 
ground states, which has been also observed empiricalljifl [H]. 

However, even when the LP relaxation is tight, (|19p are a very poor approx- 
imation of max-marginals. They are inexact even for acyclic hypergraplis. 

4 Bethe free energy and belief propagation 

Let the true entropy in ()14p be approximated by the Bethe entropy 



where n„ is the number of hyperedgcs containing variable v. For acyclic hyper- 
graphs the Bethe entropy is equal to H{^), otherwise it can be non-concave and 

The TRW-S algorithm |12 | studied in 1211 solves the same LP relaxation as max-sum 
diffusion. The same holds for zero-temperature versions of other recently proposed convergent 
algorithms to minimize convex free energies [9] [3] 1271 14] . 





(20) 
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even negative on A. Then reads 



max 



log/x,/x) + ^ n„(log 



(21) 



The negative objective of (j2ip is the Bethe free energy. 

Next we formulate loopy belief propagation. Unlike in the 'traditional' for- 
mulation [T71 HH [331 US] , we identify messages with reparameterizations, which 
agrees with [TH eq. (2)] and [3D1- Let the marginals @ be approximated as 



\0g^l^{x^) =e^{x^,) -^e^,{y^), \0g ^Jia{Xa) =Oa{Xa) ~^Oa{y a) (22) 

where 



(23) 



Note that fiy and fia is the Gibbs distribution for the simple graphical model 
with hypergraph {{v}, 0) and (a, {a}), respectively. This corresponds to decom- 
posing (y, E) into small sub-hypergraphs. In general, /i fails to satisfy the local 
marginalization conditions of ([5]). Plugging into these conditions yields 



[Oa{Xa)+Y.Ou{Xu] 



J {xv ) + consta 



which by cancelling 0y(xy) simplifies to 



\Oa{Xa) 



u£a\v 



constn 



(24) 



(25) 



Here, constat, is a constant independent on x^ . We define a belief propagation 
fixed point to be a vector satisfying ((25)) for all w G a G i? and x„ G X^, . The 
BP algorithm then tries to reparameterize 9 to make it satisfy ((25)) . 

As discovered by Yedidia et al. [33|, BP fixed points ((25)) correspond to 
stationary points of problem ((2T)) via the map ((2^ . Heskes [S] showed that 
every stable BP fixed point is a local maximum (rather than minimum or saddle 
point) of (|2ip . but not necessarily vice versa. 



4.1 Zero-temperature limit: max-sum belief propagation 

In the zero-tcmpcrature limit, in ((^^ - ((25)) is replaced with 'max'. We 
assume in tj4.1l that this has been done. Then, ((25)) defines a fixed point of 
max-sum belief propagatior^. 

^ The traditional names 'sum-product' and 'max-product' arc misnomers in our paper 
because we stated BP in the logsumoxp-sum semiring (R, (B, -|-) rather than (as is usual) in 
the (isomorphic) sum-product semiring (IR+, x). For zero temperature, we are then in the 
max-sum semiring (R, max, -|-) rather than in the max-product semiring (R+,max, x). 
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According to H2.2[ numbers (|22p are approximations of max-marginals v°° 
and the solution set of the CSP defined by active approximate max-marginals 
is an approximation of the set argmax^^ (0, 5{xv)) of ground statcfU. Since ap- 
proximate max-marginals (j22[) are, up to normalization, equal to numbers (|23p . 
this CSP is defined by \(P\ . This formulation is consistent because (as is easy to 
verify) the value {9,5{xv)) is the same for all solutions xy of the CSP \(P\. 

It is well-known that the approximation of ground states obtained by max- 
sum belief propagation is often poor (letting alone that the algorithm may not 
converge). In our formalism, the value {9,5{xv)) for the solutions xy of CSP 
\(f\ are often fai|l from $°°(6'). It may of course also happen that the CSP [^] 
has no solution. The situation is especially intriguing if the functions 9a are 
supermodulaiQ. Then maximizing {9, 5{xv)) is tractable but the approximation 
obtained from max-sum BP can be inexact [25] . 

On the other hand, if the approximation of ground states from max-sum BP 
is good, then usually also the approximation (|22p of max-marginals is good. This 
is intuitively justified by the fact that at a BP fixed point, the (max-) marginals 
are exact in every subtree of the factor graph j23[ [24] . 

5 Direct minimization of the Bethe free energy 

Heskes [5] |5] proposed a class of convergent algorithms to find a local minimum 
of Bethe and Kikuchi free energies, based on the minorize-maximize approach 
[3 [20]. We now describe a simple representant of this class, which finds a local 
maximum of the non-concave maximization problem (|2ip. 

Let F(/i) denote the objective of ([2T]) . A family of minorants of F is con- 
structed as 

= (6* - log^,//) + ^ n„(log/2„,/x„), (26) 

where /i is a collections of variable distributions /Et«, non-negative and normal- 
ized. For any /i and jl we have F(fj,,il) < F(jj,), with equality if and only if 
fj-v — ^J■v for all V £ V. This follows from the well-known fact that any non- 
negative and normalized vectors ^„ and /i„ satisfy (log/i„,/i„) < (log z^^, , /i^, ) , 
which holds with equality only if jly = ^y. 

The problem ([^T]) is now split into two nested problems 

max max i<"(/i, /2). (27) 

^ Decoding an assignment from a fixed point of the loopy majc-sum/max-product BP has 
been addressed in the BP literature (see e.g. 12611251 ) but never has been formulated as a CSP. 
But we believe this is a very natural formulation. 

^ Note that this never happens for max-sum diffusion, where solutions of the CSP [0] are 
inevitably ground states. 

^ For supermodular 9a, CSP \9~\ always has a solution. This is easy to prove: since function 
6a are supermodular, functions 6a are supermodular as well, and then the proof proceeds like 
the proof |31) that max-sum diffusion exactly solves (permuted) supermodular problems. 
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The inner problem is a concave maximization, which can be solved optimally 
- in fact, it has the form ([T6)) . The objective maXf^^A F{fi, fi) of the outer 
problem is a non-concave function of ji and thus we can only hope to find its 
local maximum. The algorithm has two nested loops, corresponding to the inner 
and outer problem. The outer iteration has two steps: 

1. Keeping fl fixed, find fi £ A that maximizes F(fi, jl). 

2. For aU v & set <— ^y. 

Each of these two steps increases F{iJ.,ji). For Step 1, this is true by definition. 
For Step 2, it follows from the minorization property of F. The algorithm 
converges to a state when /i is the maximum of F{fj., fl) and = fi^, therefore 
^ is a local maximum of (j2ip . 

In Step 1, F{^,fl) needs to be maximized over ^ S A. This can be cast in 
the form (fTBl) . First we substitute logfi = 9. Note that after this substitution, 
the normalization condition p,y{xy) ~ 1 reads 9y{xy) = 0. Then 



Fi^i,^l)^{9 



{9 - logAi,M) 



where, using that J2v ''^v9v ~ J2a Euea ^^^'^ vector 9 is given bjH 

9y ~ 9y , 9a = 9a + ^ 9y . 



(28) 



(29) 



The inner problem is dualized, which changes (|27p to a saddle-point prob- 
lem. As described in fJSj the dual is solved by reparameterizing 9 such that 9 
satisfies (jlSp (which minimizes U{9)) and then computing /i from 9 using (|19p . 
Since = 6*" + X^usa '-^'^ reparameterize 9 instead of 9. The outer 

iteration now reads as follows: 
1. Reparameterize 9 such that 



[9a{Xa) +Y,~9u{Xy 



9y (^Xy ) . 



(30) 



2. For aU v€V, set 9y ^ 9y -^9y{xy). 



The number 



aeE Xa 



is decreased by Step 1 and it is increased by Steps 1-1-2 combined. The algorithm 
converges to a state when 9y = 9y — 0^ 9y{xy). Then, 9 is a. BP fixed point. 

This is indeed very obvious: since 9y and 9y are equal up to an additive constant. 



* We could alternatively choose d as 6y = dy + nyOy, 9a = da- But since II29I I directly 
compares to II23I I. the choice II29I I more clearly shows the connection with BP fixed points. 
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||5D)) becomes the same as (|24p . therefore ([25]) holds. If reparameterizations are 
represented by messages, we obtain Algorithm [2] 

Let us remark that the normalization in Step 2 is not necessary, we could 
just set 9v <— 9y. This would not affect convergence to a BP fixed point but 
U{9) would lose its meaning and 9^ might grow unbounded. 



Algorithm 2 Double-loop algorithm to find a BP fixed point. 

Initialization: Choose any 9 with 9i,{xy) = 0. Choose any a. 
repeat > outer loop 

repeat > inner loop 

for V £ a £ E and Xy S Xy such that 9y{xy) > — oo do 

^[9^{xy)~^[9:ixa) + J29M 



end for 
until convergence 

For aU v€V, set 9^ ^ 9^ ~ ^ 9^{xy) 



until convergence 



The outer loop is guaranteed to converge only if the inner loop reaches full 
convergence. There is no theoretical guarantee ensuring convergence with a 
finite number of inner iterations - this unpleasant feature is common to double- 
loop algorithms applied to saddle-point problems. However, this does not seem 
to be an issue in practice. 



5.1 Zero-temperature limit 

Replacing 9 with l39 in all the formulas and taking the limit — s- oo again 
results in replacing '0' with 'max'. Then, Algorithm [2] converges to a max-sum 
belief propagation fixed point. 

Though we never observed the algorithm fail to converge, its convergence 
(with the inner loop run to full convergence) is only a conjecture. The argument 
is that if it converges for any /3 < oo then it is reasonable to assume that it 
will converge also in the limit. But we suspect that finding a formal proof for 
/3 ^ oo may be difficult, especially when convergence of the inner loop (max-sum 
diffusion) itself is a conjecture to date. Note that, unlike for /? < oo, the proof 
cannot be based on the fact that the value of U°°{9) monotonically decreases 
because it often remains constant after the first several outer iterations. 



Uniform initialization. Depending on the initial 9^ , the algorithm can con- 
verge to different fixed points (as we indeed observed). Particularly interesting is 
the case when the initial 9y are all uniform ~ due to the normalization condition 
maxj;^ 9y{xy) = 0, this means 9 = 0. Next we focus only on this case. 

Figure [T] shows how the algorithm converged for different types of pairwise 
interactions and different types of graph. Occasionally (e.g., for repulsive inter- 
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Figure 1: Convergence of zero-temperature version of Algoritlim[2]witli initial 6-u — 0. 
Tlie liorizontal axis is tiie number of outer iterations, tiie vertical axis is logj^Q of average 
residuals to the max-sum BP fixed point condition (|24p . The title is in the form 'type 
of pairwise interactions, number of labels, graph type\ The grid graph had 20 x 20 
and the complete graph 40 vertices. The unary potentials were generated randomly. 

actions and two labels) the residuals approached zero non-monotonically. The 
inner iteration was run to almost full convergence, however the results were not 
qualitatively affected by this. 

We made the following key observation: 

If the algorithm is initialized with 6 ^ Q then after the first outer itera- 
tion \d~\ and U°°{9) remain unehanged. 

This observation is only empirical, currently we have neither a formal proof nor 
a counterexample. It has an important consequence. If initially 9 = 0, then the 
first outer iteration is just Algorithm [T] applied to 9 = 9. If all subsequent outer 
iterations do not change [0] , then CSP {91 after convergence of Algorithm [5] is 
the same as CSP \9~\ that would be obtained by running Algorithm [T] on 9. 

Thus, the approximate ground states obtained by Algorithm [2] are the same 
as those obtained by Algorithm [T] However, since Algorithm [5] converges to a 
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max-sum BP fixed point, approximate max- marginals obtained by Algorithmic] 
are expected to be much more accurate than those obtained by Algorithm [1] 

6 Conclusion 

We showed in H3.l\ and HA.1\ that the properties of max-sum diffusion (and all 
MAP inference algorithms based on LP relaxation) and max-sum belief prop- 
agation are complementary: the former yields good approximation of ground 
states but poor approximation of max-marginals, the latter vice versa. The 
double-loop algorithm initialized with = combines advantages of both: it 
yields approximate ground states that are exact for supermodular problems and 
approximate max-marginals that are exact in every sub-tree of the factor graph. 

Our paper is primarily theoretical, more experiments are needed to compare 
approximate max-marginals from the double loop algorithm with ground truth. 

We have not pursued another potentially interesting application of the double- 
loop algorithm with non-uniform initialization 0. It is known that max-sum 
BP occasionally yields better approximate ground states than LP relaxation. 
This has been observed e.g. for some problems on highly connected graphs [TU] . 
However, the max-sum BP algorithm does not always converge, thus the con- 
vergent double loop algorithm might be useful here. 

The double-loop algorithm could be speeded up by using an inner loop with 
tree updates as in e.g. TRW-S [T2] rather than edge updates as in max-sum 
diffusion. We believe this is possible. 
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